#### Reliability concerns with the SNS Machine protection Sytem

Doug Curry, Eric Breeding, Alan Justice

ARW May 1, 2015



ORNL is managed by UT-Battelle for the US Department of Energy

### **Overview**

- Leading causes of reliability issues at SNS
  - False trip reports
  - Excessive trip delays
  - Fail to deliver fault status
- Additional reliability concerns at SNS
- 2<sup>nd</sup> Generation Highlights



# False trip reporting

 Excessive Common Mode currents on input channels



- Mitigation techniques
  - Redesigned interface module with integrated ferrites that cover broad frequency range
  - Replaced Power Supply opto-couples with lower dynamic range but higher CMRR



## **Excessive system delays**





## Failure to deliver fault status

• Output enable stuck high



- Provides a false MPS status to the top level node and permits continued beam production
- Firmware implementation did not match the architecture approved by the review committee
  - Single clock source generated at the lowest node in the chain
    - Each node simply grants or prevents the passing of the permissive signal
  - Passed local clock reference instead of original clock source



## **Reporting false positive MPS status**





## Additional reliability issues at SNS

- Low current drivers
  - Increased complexity
    - Non-standard configuration
- High density connectors



- Bent pins "stuck high" status shorted to power pin
- PLC interfaces
  - Single digital output interface "solid state output"
- Operating System errors
  - Software interlocks
- Hardware Assembly inconsistencies



# Methods for improving system reliability

- Power measurement checks performed every 6 months
  - Plot the output power of fiber transmitters over time to identify degrading components
- 100% system verification checks performed after every extended outage
- Standardized testing procedures for improved consistency
- Equipment status tags
  - Red, yellow, and green tags



# Modernization is imperative to achieve high reliability and maintainability

- Aging hardware increases the probability of failures within the MPS.
- Failures must be mitigated quickly to achieve the high availability goals of the SNS.
- Unfortunately, the MPS is comprised of several obsolete components. Component obsolescence prevents the acquisition and maintenance of a safe supply of spare parts.
- As failures become more likely, the existing inventory of spare parts will become depleted.



#### Modernization affords a unique opportunity to improve MPS reliability and functionality

- Failure points are reduced through simplification of the Fast Protect System (FPS) architecture.
- Functionality required for second target station (STS) may be fully implemented with the new FPS hardware.
- Lessons learned over years of operating the accelerator may be applied with new FPS hardware.
- Development of better in-system diagnostics will be feasible with modern components and devices.



#### **Proposed Architecture Achieves a Significant Reduction in Hardware**

|                                | Current System | Proposed System |
|--------------------------------|----------------|-----------------|
| MPS Chassis                    | 102            | 60              |
| Interface Chassis              | 20             | 0               |
| Programmable Interface Chassis | 5              | 0               |
| VME Chassis                    | 8              | 0               |
| VME Computer                   | 8              | 0               |
| Rear Transition Board          | 102            | 0               |
| VME PMC Expansion Boards       | 24             | 0               |
| Field Node Cards **FPGA        | 102            | 60              |
|                                |                |                 |
| MPS Master Chassis             | 1              | 1               |
| MPS Trigger Control Chassis    | 1              | 0               |
| MicroTCA Cards                 | 0              | 4               |
| Linux Workstation/uTCA IOC     | 0              | 1               |
| VME Computer                   | 1              | 0               |
|                                |                |                 |
| Total Hardware Count           | 382            | 126             |

Legacy data courtesy of Alan Justice

#### 67% Reduction in Failure Points



#### **Consolidation of Hardware Into Fewer Entities Mitigates Cable-Induced Failures**

|                             | Current System | Proposed System |
|-----------------------------|----------------|-----------------|
| SCSI Cables                 | 206            | 0               |
| Fiber Links                 | 204            | 65              |
| Timing Links                | 226            | 2               |
| Miscellaneous MPS Links     | 14             | 7               |
| Total # of<br>Interconnects | <b>650</b>     | 74              |

Data courtesy of Alan Justice

89% Reduction in Failure Points



#### The Field Node Has Significantly Fewer Components and Interconnects



Figures courtesy of Alan Justice

Hardware assembly



- No software interfacing to EPICS at field nodes
- No RTDL/Event Link
- No VME SBC



# **Current configuration**

Front

#### Required hardware for Deployment

- VME Crate
- VME IOC
- MPS chassis
- Teknobox PMC card
- \*\*VME PMC Span
- SCSI Cables
- RTDL and Event
  Link cables
- Rear Transition
  Card



#### High density connections

Rear



#### Leverage Readily Available COTS Hardware To Expedite Deployment



# SNS 2<sup>nd</sup> Generation System

- System integrity checks prior to the production of each beam pulse
  - Verify one channel per machine cycle
  - Full MPS system verification checks competed every 30 seconds for 1800 input channels
  - 100% status reports from all interface modules required prior to cycle production
  - SFP "fiber" status
- Standardized modules
  - Hot swappable for increased interface flexibility and maintainability "leave chassis in place"
- Increase Timing System coupling Interface to provide improved post mortem analysis event



#### The new architecture Shall Reside Within Legacy Infrastructure

- The proposed FPS will be "form, fit and function compatible" with the legacy system to the greatest extent possible.
- The new FPS hardware shall reside in equipment racks that house the legacy FPS.
- Each FPS node shall accept existing input field wiring.
- All modules shall be compatible down to the pin level with the legacy FPS chassis and all field inputs.



**Questions???** 

Beautiful Dance Moves cos(x) tan(x) cot(x) sin(x) TT X VZ 5-2

